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Abstract 

Background: Large clinical genomics studies using next generation DNA sequencing require the ability to select 
and track samples from a large population of patients through many experimental steps. With the number of 
clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information 
management systems to manage the thousands of patient samples that are subject to this type of genetic 
analysis. 

Results: To meet the needs of clinical population studies using genome sequencing, we developed a web-based 
laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously 
evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as 
MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. 
MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. 

Conclusions: We maintain a publicly available demonstration version of the application for evaluation purposes 
at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in 
SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery. 
stanford.edu/software/mendelims/. 

Keywords: Next generation sequencing, Clinical studies. Laboratory information management. Pathology, 
Genomics, Genetics 



Background 

With next generation DNA sequencing (NGS) now be- 
ing a commonly adopted technology, the genetic analysis 
of large clinical populations has become practical and is 
widely used for identifying disease-related germline and 
somatic variants such as cancer mutations. The genetic 
variation from thousands of individuals can now be 
identified with NGS whole genome, exome, targeted and 
other resequencing approaches. Due to the dramatic in- 
crease in the number of NGS clinical genomics studies, it 
has become increasingly important to develop adequate 
laboratory information managements systems (LIMS) to 
manage the thousands of patient samples that are subject 
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to NGS analysis. Tracking and managing the clinical sam- 
ple workflow involved in NGS analysis is an extremely dif- 
ficult task, given the logistical issues of enrolling patients, 
fragmented procedures for acquisition of clinical study 
samples, complex molecular preparation steps and the 
intricacies of the NGS processing pipeline. Commercial 
systems are available but typically are high cost and re- 
quire extensive modification to address the specific needs 
of biomedical research groups conducting genetic analysis 
on populations. 

As a general and unique solution to the needs of man- 
aging the experimental workflow for clinical genome 
sequencing projects, we developed MendeLIMS, a web- 
based, robust and flexible solution for integrating the 
management of clinical study samples and NGS pro- 
cesses. With respect to genetic studies, MendeLIMS 
functionality can be grouped into four major categories: 
(i) enrollment of patients and acquisition of clinical 
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study samples, (ii) sample assessment and processing, 
(iii) genomic analysis through preparation of next gener- 
ation DNA sequencing libraries or other molecular assays 
such as microarrays and finally, (iv) DNA sequencing of 
samples with associated quality control metrics. Tracking 
of sequencing steps is currently supported for the follow- 
ing lUumina NGS instruments: GAIIx, MiSeq, HiSeq, 
HiSeq2500, NextSeq but can easily be configured for any 
type of NGS instrument which follows a sequencing 
library to flow cell workflow. We maintain a publicly 
available demonstration version of the application for 
evaluation purposes at http://mendelims.stanford.edu. 

Implementation 

MendeLIMS is written in Ruby using the open source 
web application framework Ruby on Rails (RoR) and im- 
plementation is platform-independent. Instructions for 
installation are provided in Additional file 1. For our 
own in-house instance of MendeLIMS, our servers run 
Linux/Ubuntu and we use the MySQL relational data- 
base management system (RDBMS). The application is 
easily configured to use any other SQL RDBMS sup- 
ported by RoR. Figure 1 shows a simplified database 
schema for the major tables. A more comprehensive 
schema is provided in Additional file 2. 

The web interface is designed to handle a variety of 
queries in a modular format (Figure 2). To facilitate 
consistent data entry, MendeLIMS uses drop-down lists 
for seamless data validation whenever possible. The drop- 
down lists themselves are user-configurable by users with 
the appropriate authorization. Examples of user-confi- 
gurable items include sample types, sequencing library 
multiplexing schemes, alignment references and DNA se- 
quencers. All of the features are described in the users 
manual (Additional file 3). 

The look and feel of the application is easily changed 
or customized since all web pages inherit styles from an 
application-wide cascading style sheet (CSS), and in keep- 
ing with RoR convention, overall page layout and naviga- 
tion is controlled by a single HTML layout file. 

Sample nomenclature 

To enable accurate tracking of samples from their initial 
acquisition, through all key intermediate steps and ultim- 
ately to DNA sequencing, we developed a sample labeling 
nomenclature loosely based on the scheme employed by 
the Cancer Genome Atlas (TCGA) project (https://wiki. 
nci.nih.gov/display/TCGA/Working+with+TCGA+Data). 
We maintain the original unique sample barcode, and add 
successive suffixes to indicate processing performed. 

Acquisition of clinical study samples 

After enrollment into a study, patient samples and their 
characteristics are entered into MendeLIMS through a 



web interface (Figure 2). A unique identifier (ID) is 
assigned for each new cUnical sample. The user has the 
option of entering sample-relevant clinical data includ- 
ing pathology information from clinical reports, digital 
images originating from pathology slides and general 
clinical information about the patient (Figure 3). For effi- 
cient subsequent retrieval of the physical samples, the 
storage freezer and container location is entered using a 
standard nomenclature. If email triggers are configured, 
an email is automatically sent to an identified central 
coordinator and/or to a specified owner for the particu- 
lar clinical trial giving details of any new sample entered 
into the system. The web interface enables sample entry 
to occur at any location thus facilitating sample entry by 
various researchers and cUnical coordinators. 

Clinical study sample assessment and processing 

Any manipulation of clinical study samples is tracked 
(Figure 3). This includes dissection of tissue samples and 
nucleic acid extraction. Details of these sample workflow 
operations are stored in MendeLIMS including volumes 
of any extracted macromolecules such as genomic DNA, 
concentration metrics and sample storage location. This 
greatly facilitates the managements of these precious 
resources for population studies. 

Tracking molecular and genomic analysis 

Molecular assays and sequencing library steps are also 
captured in MendeLIMS (Figure 3). Sequencing libraries 
may be entered as singleplex (e.g. one sample per library), 
or multiplex (e.g. multiple samples per library with each 
sample tagged with a unique starting sequence). The 
multiplex indexing schemes are user- configurable, both 
for number of samples which can be multiplexed on one 
lane, and for the actual starting sequences used. 

Tracking the next generation sequencing workflow 

In preparation for initiation of an NGS analysis, a se- 
quencing run is entered into the system by selecting 
existing libraries and placing them into separate lanes or 
partitions. Normally an entire sequencing run is entered. 
However, the system is also able to handle partial sequen- 
cing runs to accommodate the scenario where sequencing 
may be performed as a service and the run is shared 
between multiple groups who are not privy to each other s 
results. MendeLIMS generates a unique sequencing run 
key based on the sequencing date, sequencing machine, 
and a unique sequential run number. Once the sequen- 
cing run has completed, the initial quality control (QC) 
metrics for the run can be entered into the system. 
This is currently handled by an offline ruby script, 
but will in future be incorporated into the web application. 
MendeLIMS supports any type of sequencing application 
including whole genome, exome, targeted and RNA-based 
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MendeUMS - Core Tables 



patients 

^ id 

clinical_id_enaypted ii ;L _l 
gender ' i 
ethnicity i :hari35) 
race ■ 3-1 70) 
hipaa.encrypted varbinary 
created.at datetime 
updated.at datetime 



sample_characteristics 

^ id 

patient_id 
collection.date 
clinic_orjocation . .i 
consent_protocol_id int 
consent_nr .a'^char(l5) 
gender ci an ' i 
rthnld^; varchari : : 
9 more columns. 



__i 



pathologies 

^ id 

patjent_id 

collection.date dale 
pathology.date date 
pathologist vchariSO) 
generaLpathoiOfly '.'archar(25) 

1G mere columns 



molecular_assays 

^ id 

barcode_key ,T:hari20) 
processed_sampleJd int 
protocoljd t 
owner vi25) 
preparation.dale: date 
volume iriiiint 
concentration decimal(8 3) 
6 more columns... 



samples 

^ id 

I patient_id 

sample.characteristicjd 

I source_samplejd rt 

source_barcode_key varchar(20) 
barcode_key .archar(20) 
alUdentifier .archar(20) 

I 16 more columns... 



0 



processed_sam pies 

^ id 

samplejd mt 
patient_id nt 
protocoljd ^ 
extraction_type varchar(25) 
process! ng.date date 
input_uom .3rchar(25) 
input_amount decimal(il, 3) 
18 mce columns ., 



seq_libs 

^ id 

barcode.key .archar(,20j 
lib.name varchar(50) 
libraiy.type char(2) 
lib.stabjs: char(2) 
protocoljd int 
owner 



I lib_samples 

^ Id 

seq_libjd 
splexjlb_ld 

splex_lib_barcode '.'archar(20) 
processed_sample_id; int 
sample_name i :har(50) 
source_DNA t arrsO) 
runtype_adapter - 1 ^ =^ " 
index.tag 
I enzyme.code h - a : 
4 more columns... 



histologies 

^ id - ' 
samplejd 

he_barcode_key. varchar(20) 
he_date '.aH 

histopathology varchar(25) 
he.classlfication varchar(50) 

8 more columns 



r 



flow_cells 

^ id 

flowcell_date : 
nr_bases_read1 i ar(4) 
nr_basesjndex char(2) 
nr_bases_read2 char(4) 
cluster_kit archarno) 
sequencing_klt ' ■> ' 



fk)w_lanes 

^ id 

flow_cellJd int 
seqjibjd mt 

sequencing_key . archa'^ 5C 
machine_type 3:1a 
lib_barcode a :i-a 2: 
lib_name .archan.SOj 
lane_nr tmyint 
■ : ir,jie columns... 



align_qc 

^ Id 

tIow_laneJd 
sequencing_key 
lane_nr 
lane_yield i t 
dusters.raw irt 
clusters jjf c 
cycleljntensity_pf 
cycle20_intensity jx:t_pf 
31 more columns... 



Figure 1 Database schema for MendeLIMS. Main entities and tlieir relationsliips are sliown in tliis diagram, and a complete scliema sliowing 
otiier ancillary tables is provided in the supplementary material. 



sequencing studies. The system stores sequencing library 
and sample lineage, flow cell composition, and sequencing 
run metadata, along with run status and QC metrics 
for all runs (Figure 4), The sequencing data files - for 
example bam alignment files, or vcf variant calling 
files - are not stored in MendeLIMS per se but are 
on a storage cluster accessible to all researchers in 
the group. Additionally, since we use the MendeLIMS 
sequencing run key and sequencing library/sample 
nomenclature in the analysis directory and file names, 
the files are easily cross-referenced between MendeLIMS 
and the storage cluster. 



Queries of MendeLIMS 

All queries allow specification of multiple filter criteria 
such as barcode range, date range, owner, protocol which 
enables users to quickly find the samples of interest, and 
then drill-down to more detail. For example, when viewing 
the sample query result set (Figure 2), clicking on the sam- 
ple barcode will bring up more comprehensive information 
regarding the sample, including pathology information. 
Clicking on the 'QC link from a sequencing library query 
result set shows QC data for all sequencing runs for that 
library. Query results may be exported to a tab-delimited 
file for review or for incorporation with other local data. 
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MendeLIMS 

Laboratory Information Hanagement System 



SourceDitsettions 
Extracted Samples 
Molecular Assays 
Seq Libraries 
Sequencing Runs 



Sample Processing » Molecular Assays - Sequencing/Alignment - Admin - Orders - 



Source and Dissected Samples Query 

1 I Select pannwten \ 



Sample Barcode: 



Consent Prolocot: All 

IRIS/Mendelian Genome 
NCCC/NCCC 



Drop-Down Lists 



Cinic: All 

Lucille-Packard Ped. Oncoiogy Service 
Stanford Cancer Genetics 

Race: All 

[Unknovwi] 

White 



Multiplex Tags 
Ollgo Pools 
Alignment Rets 
Seq Machines 
Disks'Directories 



Sample Site: aji 

(Unknown) 
Arm 



Preservation: Ail » 

Fresh 

Fresh Frozen ^ 



DaleFiten Co ledion Dale 
Select Date Range: 



stUpdBy: Any 



Subm<J 

Stanford Genome Tedi Center | Contact 



vveosife pfovOec by me Ji ReaeatV 
School of Kledicine and Genome Tei 
This iveOMe arx3 its content S Sranft 



Source and Dissected Samples - Processing Tree 

2 source samples (for 1 patients) Export Samples | 



Quefy: 
Source/Dissections 
Extracted Samples 
Molecular Assays 
Seq Libraries 
Sequencing Runs 



Protocols 
Froeier Locations 
Multiplex Tsgs 
Oligo Pools 



Oisks/Oirectones 



PatiMti 16 


Barcode Date Entered 


Sample Type OR Designation Pathology DX H&E 


Processing Dt Rem? 


Room/Freezer Container 


Up* By 


6J9j 2009-12-02 


Tissue/Colon Normal 


AdenoCA of colon/rectum Benign 






CCSR_2261/1(Ji) Rack 06/D3 


admin 


6493A 2011-03-02 


Dissection/Colon 


AdenoCA of colon/rectum Benign 


2011-03-02 


Y 


CCSR_2261/1(Ji) Shelf 2/ 




6493AD01 2011-03-07 


Genomic DMA 




2011-034)7 








6493Atg01 2011-03-07 


Nucleic Acid 




2011-03-07 




CCSR_2261/1(Ji) Box CRC Nucleic Acid 2010/ 




6493AP01 2011-03-07 


Protein 

Dissection/Colon 
Genomic DMA 
Nucleic Acid 
Protein 


AdenoCA of colon/rectum Benign 


2011-03-07 
2011-03-02 
2011-03-31 
2011-03-31 
2011-03-31 


Y 


CCSR_2261/1(Ji) Shelf 3/ 
CCSR_2261/1(Ji) Shelf 2/ 

CCSR_2261/1(Ji) Shelf 31 
CCSR_2261/1(Ji) Shelf 3/ 




6493B 2011-03-02 
6493BDC1 2011-03-31 
6493Brj01 2011-03-31 
6493BP01 2011-03-31 


admin 








6807 2011-02-04 


Tissue/Rectal Tumor 


AdenoCA of colon/rectum Malignant 






CCSR_2261/1(Ji) Box;8/B4 


moej 


6807A 2011-03-10 


Dissection/Rectal 


AdenoCA of colon/rectum Malignant 


2011-034» 


Y 


CCSR_2261/1(Ji) Shelf. 2/ 




Sample Comments Source sample gone 


6807ADC1 2011-03-16 


Genomic DNA 




2011-03-16 








6807Arg01 2011-03-16 


Nucleic Acid 




2011-03-16 




CCSR_2261/1(Ji) Shelf 3/ 




6807AP01 2011-03-16 


Protein 




2011-03-16 




CCSR_2261/1(Ji) Shelf 3/ 




168078 2011-03-10 


Dissection/Rectal 


AdenoCA of cdon/rectum Malignant 


2011-034)9 


Y 


CCSR_2261/1(Ji) Shelf. 2/ 




1 Sample Comntents: Source 


santple gone 













Figure 2 Query web interfaces for MendeLIMS. Database queries are managed by a series of web pages tliat liave a modular format. Different 
searcli parameters are included with drop down menus used for standardized search terminology. Based on the needs of any given group, the 
search interface can be easily modified to accommodate new search or entry functions. 



Receive sample 
from clinic 



Acquire pathology 
report and 
diagnosis 



Pathology images 
such as tissue 
sections 



Dissect sample 



Perform molecular 
assays such as 
microarrays 



Perform 
extraction 



Prepare 
sequencing library 



Prepare flow cell 
for sequencing 



Store sequencing 
quality metrics 



MendeLIMS 
Process Flow 



Sequence 
sample(s) 



Align sequence 
vs reference 
genome 



Subsequent 
variant calling 



Figure 3 Worlcflow of MendeLIMS. Multiple steps of the sample acquisition workflow for clinical studies are fully integrated with next generation or 
genomic assay procedures. This allows one to trace the genomic analysis of any given sample. 
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MendeLIMS 

Laboratory Information ManAqi>iii<>ni Systci 



Qaeiy: 

Sourc*« 
Extracted Samplet 



View Sequencing Run 
Run M: 20U1109_SH1_0(m 

Date: 2011-11-07 
Machine Type: HiSeq 

Cluster Kit V3_c80T_HS 



f Rea<ls(R1jMlaxJ{2)c 101..101 
Status Coite: Q 
Sequencing Kit V3_HS_200 



20 1 1 1 1 0 1_EH_525shotgunHua_260»6 1 TNimblegenSholgun.HiSeq 



Drop-Oown Lli 
PuMicatKMi* 



Alignment Refs 
S«| MKhinet 
Ditks/Directoriei 



Seq Run Attached Fl 



LaiM Pub? LHtBwcod 

1 "L0007G0 


!• UbNMM 

pat525met_WGS 


RunlypeMdapter 

S_PE 


Oligo Pool: Alignment Ref Trim Bai 

HWG_37 1 


les LibConc(pM) 

S 0 


2 L000760 

3 L000760 


p*525m«_WGS 
pat52Smat_WGS 
PliiX Control 


S_PE 
S_PE 


HWG_37 1 
HWG_37 1 


80 
80 




S_PE 


phix 


12 0 


5 L000759 

6 L000760 


pat260+617_CRC_Nimble9enWGS M_Pe 
pat525met_WGS S_PE 
pat525met_WGS S_PE 
p3t260*617_CRC.PJimblegenWGS M.PE 


HWG 37.1 3 
HWG 37 1 


8.0 
6.0 


7 L000760 

8 L000769 


HWG_371 

HWG.37 1 3 


8.0 

80 



Stanford Genome Tech Center | Contact 



Sequencing Library 

Seq Library \ 



Lib Barcode: 
Owner 
Protocol: 

Sample Cone: 
Size<Final f>CR): 
Align Ref: 
Oligo Pool: 
Comments: 
Edit 



Whole CSenome 
Shotgun 
49 02 ngAJl 
260 

HWG_37 1 



Library Name: pat260-<-617_CRC_Nimt>legenWGS 
Prep Date: 2011-11-07 
Adapter: M_PE 

10 O 



Lib Cone: 
Quantitation: 
Trim Bases: 3 

Notebook Ref: 20111101_EH_525shotounHua_260*617NimblegenShotoun_ 
260-^617 T/N pairs Nimbiegen llt>raries preCapture (shotgun) 





Tag S«q 

ACT 


Source Lib 


Lib Name Pool 


Sample Name 

L000757 


Patient ID 

617 


Source ONA 

2722A DDI 


No«M 


j 7 

12 
[ 13 


CGT 
GTT 
TAT 






L000758 
L000756 
L000755 


617 
260 
260 


2736A D01 
2390A D01 
2389B D01 





None 



^ Attachment 



Sequencing Lanes (2 


lanes) 


Sequencing Run# 


Lane Lib Barcode 


Lib Name 1 


201 1 1 1 09_SH 1_0099 


5 L000769 


pat260+61 7_CRC_Nimb)egenWGS 


20 1 1 1 1 09_SH 1 0099 


8 L000759 


pat260+61 7_CRC_NimblegenWGS 



Figure 4 Tracking the sequencing of clinical samples. One can follow a clinical sample from enrollment in a clinical study all the way through 
to its sequencing. For example, from the sequencing run composition one can back track to the individual libraries and the original source DNA. 
Screen shots show the various levels of querying. A sequencing library can be queried for additional information. When required, it is possible to 
even determine the time of enrollment in a study and pull up relevant images from pathology. 



Reagent tracking 

Reagents, equipment and supplies ordering, though not 
necessarily typical to a LIMS implementation, have been 
included in MendeLIMS. This feature enables for example 
the tracking of reagent and supply batches that is useful in 
troubleshooting failed sequencing runs, or the tracking of 
all expenditures against a specific funding account. 

Security 

Hypertext Transfer Protocol Secure (HTTPS) is supported 
and is currently implemented for user login pages, but is 
easily extended to other pages as needed. User authentica- 
tion is via a userid and password, and access to functional- 
ity is controlled via user roles which are defined and 
managed from the website by a user with admin' role. 
Other roles available include clinical' which allows create/ 



modify access to clinical study sample information, clin_ 
admin' which allows modification to drop-down lists 
used for system validation for sample data, 'researcher' 
which allows create/modify access to sequencing libraries 
and sequencing runs. The user manual in the supple- 
mentary material provides descriptions of all available 
user roles. 

Given the extreme complexity of dealing with private 
health information (PHI), MendeLIMS is not designed 
to incorporate PHI-related clinical data. MendeLIMS 
does store a patient identifier that is the link between 
MendeLIMS and other patient clinical information data- 
bases that are securely stored in a very limited access 
environment. The identifier is stored as a binary encrypted 
field in the MySQL database and access to this field via 
the web application is limited to users with a clinical' or 



Grimes and Ji BMC Bioinformatics 2014, 15:290 
http://www.bionnedcentral.conn/1471 -21 05/1 5/290 



Page 6 of 8 



clin_admin role; other users only see a unique system- 
generated patient identifier which for all intents and 
purposes is anonymous. 

In our current implementation used by several groups 
at Stanford University, MendeLIMS is integrated into an 
internal network, within a secured firewall All database 
transactions are logged and time-stamped to provide an 
audit trail, and automated database backups are run 
daily. An administrator can readily generate an audit 
report to keep track of changes made by users. 

Discussion 

There are commercial LIMS solutions available for NGS 
labs, some of which have been implemented at major 
genomic research centers. For example GeneSifter LAB 
Edition [1] has been implemented at Vanderbilt University; 
Progeny LIMS [2] at Pittsburg University and Clarity LIMS 
[3] at University of Washington. These systems have sig- 
nificant capabilities. However, the cost in time and money 



to implement them is often out of reach for smaller orga- 
nizations, particularly those who rely on funding from 
research grants or who require unique workflows that can 
not be implemented readily in a system designed for a 
larger institution. Given these resource constraints, open 
source options are of greater interest to this category of 
organizations. 

There are several simpler LIMS systems covering clin- 
ical study samples such as BonsaiLIMS [4], PASSIM [5] 
and SLIMS [6]. These LIMS offer basic sample manage- 
ment but do not offer comprehensive clinical sample 
tracking or the ability to define sequencing libraries and 
flow cell/sequencing run composition for NGS process- 
ing. More recent offerings which are available for open 
source installation and do support NGS processing in- 
clude Galaxy LIMS [7] and GNomEx LIMS [8]. These 
systems address the flow from DNA/RNA extraction to 
sequencing library to flow cell/sequencing run. GNomEx 
LIMS also provides some analysis workflow capability 



Table 1 Comparison among different LIMS systems 

LIMS software MendeLIMS GNomEx Galaxy LIMS QTREDS 

Clinical study patient samples 

Patient data (gender/race, MRN, pathology, histology) 
Sample processing (dissections, extractions) 
Sample location tracking 
Arrays, libraries, sequencing runs 
Molecular assays (genomic arrays, ddPCR, ..) 
Sequencing library prep (singleplex and multiplex) 
Flow cell/sequencing run setup 
Post-sequencing analysis 



Sequencing QC Yes nd Yes^ No 

Analysis workflow No Yes Yes^ No 
Security/Audit trail 

Authorization via user roles Yes Yes Yes Yes 

Audit trail Yes Yes nd nd 

H^PS/SSL security Yes nd nd Yes 



General 

Email/notification capability 
Customizable lists for data validation 
Instrument integration 

Attach files to samples/libraries/sequencing runs 
Visualization of results 
Other 

Project based billing No Yes No No 

Reagent inventory management No No No Yes 

Publicly available User Guide/Demos Yes Yes No Yes 

^^Detailed information tracked regarding sample prep; "^^Singleplex libraries only; ^functionality provided via integration with Galaxy and genome browsers; 

"^^integration with HiSeq 2000 only; ^functionality provided via integration with GenoPub. 

nd - Indicates that unable to determine from public documentation whether the functionality is provided. 



Yes 
Yes 
Yes 



No 
No 
No 



No 
No 
No 



No 

Yes' 
Yes 



Yes 
Yes 
Yes 



Yes 
Yes 
Yes 



No 
Yes 
Yes 



Yes 
Yes" 
No 



Yes Yes Yes^ Yes 

Yes Yes Yes Yes 

No No Yes^ No 

Yes nd No nd 

No Yes"^ Yes^ No 
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and integrates data visualization via genome browsers 
such as UCSC Genome Browser [9] or Integrative 
Genomics Viewer (IGV) [10]. Galaxy LIMS takes advan- 
tage of the Galaxy infrastructure to also provide analysis 
workflow and data visualization. However, none of these 
systems natively provide tracking of clinical study data 
such as consent protocols, pathology and histopathology 
information. Sample lineage and sample tracking via 
consistent nomenclature, drill down to various levels of 
source data, and freezer container/location information is 
also not addressed 

Another open source option is QTREDS [11]. This 
LIMS has a strong focus on experimental protocols for 
sample preparation and tracks detailed steps which 
MendeLlMS and other systems do not specifically track 
such as sonication, end repair or ligation as part of 
exome library preparation. QTREDS also manages in- 
ventory of reagents for sample preparation and triggers 
low stock level alerts. However there is no tracking asso- 
ciated with clinical study samples, and NGS support is 
limited to the sequencing libraries and their associated 
sequencing status, rather than flow cell composition and 
sequencing run itself. In contrast, MendeLlMS provides 
full sample lineage tracking back to patient and the 
clinic and consent protocol where the sample originated, 
as well as support for all major processing steps through to 
the DNA/RNA sequencing and QC (Table 1). Additional 
functionality that is useful for tracking is data related to 
which cluster and sequencing kit versions were used for a 
particular run, and what publications (if any) reference the 
results from that run. 

Conclusions 

Clinical population studies using NGS sequencing 
require management of hundreds if not thousands of 
samples, including various intermediate processing steps, 
and the resulting sequencing data. A LIMS system is 
critical to the effective management of this data and the 
generation of reproducible results. The currently avail- 
able open source or commercial systems may meet the 
needs of some research groups, however for those 
groups where the time and monetary cost of a compre- 
hensive commercial system is prohibitive, there is no 
end-to-end open source solution covering enrollment of 
patients in a clinical study through genome sequencing 
analysis. Our system addresses all of these needs with a 
specific focus and seamless integration of clinical study 
enrollment through to NGS. 

In MendeLlMS, all data is consolidated into one 
authoritative centrally accessible source repository, elimin- 
ating multiple distributed spreadsheets. Samples are trace- 
able from a lane on a sequencing run, back to the patient 
diagnosis, pathology, and all processing steps in between. 
In conjunction with a standard barcoding nomenclature 



and flexible query capability, this significantly reduces 
errors in sample tracking, provides a comprehensive view 
of data being sequenced and has resulted in MendeLlMS 
becoming an invaluable tool for the management of our 
clinical sequencing studies. 

Availability and requirements 

- Project name: MendeLlMS 

- Project home page: http://dna-discovery.stanford. 
edu/software/mendelims/ 

- Project demo site: http://mendelims.stanford.edu/ 

- Operating system(s): Platform independent 

- Programming Language(s): Ruby, Ruby on Rails, 
HTML, Javascript 

- Server requirements: Apache2, Mongrel or 
Passenger, Ruby 1.9.3+, Rails 3.2.x, MySQL S.OWeb 
browser requirements: Firefox, Chrome, IE, Safari 

- License: Any restrictions to use by non-academics: 
None 

Additional files 



Additional file 1: An installation guide for MendeLlMS. 
Additional file 2: MendeLlMS database schema diagram. 
Additional file 3: User's guide for MendeLlMS. 
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